In this report, we aim to estimate the association between the genomic offset predictions and mortality rates in natural populations, a proxy of the absolute fitness. A positive association between the genomic offset predictions and the mortality rates would suggest that the genomic offset approach captures maladaptation in the face of demographic complexity (i.e., the effects of other processes on spatial variance in allele frequencies, such as expansion history and gene flow) as well as the genetic architecture of climate adaptation (polygenic trait architectures, G×E interactions, nonadditive genetic variance).
For that, we use mortality data from the French and Spanish National Forest Inventories (NFI) harmonized in Changenet et al. (2021). The French data relies on temporary plots sampled between 2005 and 2014 while the Spanish data relies on permanent plots sampled during the second (from 1986 to 1996) and third NFIs (from 1997 to 2008). A tree was recorded as dead if its death was dated at less than 5 years ago in the French NFI or if it was alive in the second inventory but dead in the third one in the Spanish NFI.
In order to account for the different census intervals between inventories, we modeled the proportion \(p_{i}\) of maritime pines that died in the plot \(i\) during the census interval with the complementary log-log link and an offset on the logarithm of the census interval \(\Delta_{i}\) for the plot \(i\), as follows:
with \(N_{i}\) the total number of maritime pines in the plot \(i\), \(m_{i}\) the number of maritime pines that died during the census interval \(\Delta_{i}\) in the plot \(i\), \(C_{i}\) the basal area of all tree species confounded in the plot \(i\) (to account for the competition among trees) and \(GO_{i}\) the estimated genomic offset in the plot \(i\). We used country-specific intercepts \(\beta_{0,c}\) to account for the methodological differences between the French and Spanish inventories that may bias the estimations.
The building and evaluation of the statistical model was done in the report 14_ValidationNFI_ModelComparison.qmd.
NFI data
This dataset was harmonized in Changenet et al. (2021).
Code
nfi_data <-readRDS(here::here("data/NFIdata/NFIrawdata_Changenet2021.rds")) %>%droplevels() %>%as_tibble() %>% dplyr::select(plotcode, longitude, latitude, country, yearsbetweensurveys, # number of years between surveys surveydate1, # date first survey Spanish NFI surveydate, # year of the second survey in the Spanish NFI and the unique survey in the French NFI treeNbrJ.M, # number of dead trees in the plot treeNbrJ.IMall, # total number of trees in the plot BA.ha.plot.1# basal area of all tree species in the plot ) %>% dplyr::rename(nb_years = yearsbetweensurveys,nb_dead = treeNbrJ.M,nb_tot = treeNbrJ.IMall,basal_area = BA.ha.plot.1,first_survey = surveydate1,second_survey = surveydate) %>% dplyr::mutate(first_survey =case_when(!is.na(first_survey) ~str_sub(first_survey,1,4) %>%as.numeric()), # keep only the year of the 1st surveyprop_dead = nb_dead/nb_tot, # proportion of dead treesannual_prop_dead = (nb_dead/nb_tot)/nb_years) # annual proportion of dead trees
Variables in the dataset
Plot code: plotcode
Longitude and latitude of the plot: longitude and latitude.
Country in which the plot is: country (ES = Spain, FR = France).
The number of years between surveys in the Spanish inventory (which is equal to 5 in the French inventory as mortality is estimated in the five years before the survey date): nb_years.
The dates of the first and second survey in the Spanish inventory: surveydate1 and surveydate2.
The year of the second survey in the Spanish inventory and of the unique survey in the French inventory: surveydate.
The number of dead trees in the plot: nb_dead.
The total number of trees in the plot: nb_tot.
The basal area of all tree species in the plot (proxy of the competition among trees): basal_area.
12610 plots in this dataset: 4097 from the French NFI and 8513 from the Spanish inventory.
No maritime pine was recorded in these eight plots (nb_dead = 0 and nb_tot = 0). We check that this is the case for the 664 plots by looking at the unique values in the columns nb_dead and nb_tot in the subset of plots with missing data for the proportion of dead trees (the df_na_prop_dead dataset).
nfi_data %>%ggplot(aes(x=nb_years, fill=country)) +geom_histogram(alpha=0.5, position="identity") +labs(y="Count of plots",x="Number of years between surveys") +theme_bw()
We are going to extract the climatic data between 10 years before the first survey until the date of the second survey. For the French NFI, as there is only one survey, we will extract climatic data for the 15 years before the survey.
So, we need annual climatic data at the location of the NFI plots for the period 1972-2014.
Calculate climate for each plot
Once we have the annual climatic data at the location of the NFI plots, we generate two datasets:
one with the past climate data of the NFI plots, across the period 1901-1950.
one with the climatic data across a period specific to each plot, corresponding to:
The period from 5 years before the first inventory date to the second inventory date for the Spanish NFI.
The period from 10 years before the inventory date and the inventory date for the French NFI. A tree is considered dead in the French NFI if its death is estimated as less than 5 years before the inventory date.
We included the 5 years preceding the estimated date of tree death, as the climate of the previous years may have lag effects on tree death.
We load the annual climatic data, which corresponds to point estimate at the location of the NFI plots:
We calculate the average climatic conditions across the period 1901-1950:
Code
# Function to calculate the average of the climatic variables at the location of the NFI plotssource(here("scripts/functions/calc_avg_clim_var.R"))clim_ref <- clim %>%calc_avg_clim_var(ref_period =c(1901,1950), id_spatial_points ="plotcode")
Code
# !!Coding warning!! ####################### The coordinates from ClimateDT are not exactly the same as the original ones. # Therefore, merging the climatic data with the original data has to be done with the `plotcode` column # and not the latitude and longitude of the populations.# Comparing original coordinates with the ones from ClimateDTcomp_coord <- clim_ref %>% dplyr::select(plotcode,latitude,longitude) %>% dplyr::rename(latitude_climDT=latitude, longitude_climDT=longitude)comp_coord <- nfi_data %>% dplyr::select(plotcode,latitude,longitude) %>%inner_join(comp_coord, by="plotcode") %>%mutate(diff_latitude=latitude_climDT - latitude,diff_longitude=longitude_climDT - longitude) %>% dplyr::filter(diff_latitude !=0| diff_longitude !=0)# There are very very small differencesrange(comp_coord$diff_latitude)range(comp_coord$diff_longitude)
Code
# checking time periods for Maurizio# ==================================time_periods <- clim_survey %>% dplyr::select(min_year_clim,second_survey) %>%group_by(min_year_clim,second_survey) %>%summarise(count=n()) time_periods %>%print(n=nrow(.)) %>%kable_mydf()time_periods %>%filter(count <10)time_periods %>%write_csv(here("data/ClimaticData/NFIplots/NFIplots_TimePeriods.csv"))
We then calculate the average climatic conditions across the survey periods specific to each NFI plot:
Code
# We calculate the mean climatic values between:# - five years before the first inventory date# - the second inventory dateclim_survey <- nfi_data %>%mutate(min_year_clim =case_when(country=="FR"~ second_survey-10, country=="ES"~ first_survey-5))# we calculate the average climatic conditions for each specific periodclim_survey <- clim_survey %>%group_by(min_year_clim,second_survey) %>%group_split() %>% purrr::map(\(x){min_year <-unique(x$min_year_clim)max_year <-unique(x$second_survey)clim %>%right_join(x[,"plotcode"],by=c("plotcode")) %>%calc_avg_clim_var(ref_period =c(min_year,max_year), id_spatial_points ="plotcode")}) %>%list_rbind() # we save in a list the average climatic conditions across the two periods (reference period and period of the NFI surveys)list(clim_ref = clim_ref,clim_survey = clim_survey) %>%saveRDS(here("data/ClimaticData/NFIplots/NFIclimate.rds"))
We save the averaged climatic data at the locations of the NFI plots in the DRYAD repository:
dataset ClimateDT_NFIPlots_PastClimates.csv for the averaged climates over the reference period 1901-1950.
dataset ClimateDT_NFIPlots_SurveyClimates.csv" for the averaged climates over the survey periods specific to each plot.
S4 class stanmodel 'anon_model' coded as follows:
data {
int N;
vector[N] log_nb_years; // Offset to account for different census intervals
vector[N] C; // Proxy of the competition among trees (i.e. basal area)
vector[N] GO; // Genomic offset
int nb_dead[N]; // Number of dead trees in the plot
int nb_tot[N]; // Total number of trees in the plot
int<lower=0> nb_country; // Number of countries
int<lower=0, upper=nb_country> country[N]; // Countries
}
parameters {
vector[nb_country] alpha_country;
real beta_GO;
real beta_C;
}
model {
nb_dead ~ binomial(nb_tot,inv_cloglog(alpha_country[country] + beta_GO * GO + beta_C * C + log_nb_years));
alpha_country ~ normal(0, 1);
beta_GO ~ normal(0, 1);
beta_C ~ normal(0, 1);
}
Running the model for each method (GDM, LFMM, GF and RDA) and each input (all SNPs - only for LFMM-, candidate SNPs and control SNPs):
Code
coefftab <-lapply(unique(df$method), function(method_i){ sub_data <- df %>%filter(method == method_i) %>%inner_join(nfi_data, by="plotcode")# Data in a list for Stan stanlist <-list(N =nrow(sub_data),nb_dead = sub_data$nb_dead,nb_tot=sub_data$nb_tot,GO = (sub_data$GO -mean(sub_data$GO))/sd(sub_data$GO),C = (sub_data$basal_area -mean(sub_data$basal_area))/sd(sub_data$basal_area),nb_country=length(unique(sub_data$country)),country=as.numeric(as.factor(sub_data$country)),log_nb_years=log(sub_data$nb_years))# Running stan modelmod <-sampling(stancode, data = stanlist, iter =2000, chains =4, cores =4,init=0, save_warmup =FALSE)# Save coefficients broom.mixed::tidyMCMC(mod,droppars =NULL,estimate.method ="median", # does not work, the mean is estimated, not the median# to get the median, robust = TRUEess = F,rhat = F,conf.int = T,conf.level =0.95) %>%#filter(str_detect(term, c('beta'))) %>% dplyr::rename(median=estimate,std_error=std.error,conf_low=conf.low,conf_high=conf.high) %>%mutate(method=method_i,method_type=unique(sub_data$method_type),method_input=unique(sub_data$method_input),.before=1) }) %>%bind_rows()coefftab %>%saveRDS(file=here("outputs/ValidationNFI/coefftab.rds"))
The coefficients
Below, we plot the 95% credible intervals of:
the \(\beta_{GO}\) coefficients, which stand for the association between mortality rates in NFI plots and the genomic offset predictions (i.e. capturing the potential maladaptation of the populations at the location of the NFI plots).
the \(\beta_C\) coefficients, which stands for the association between the tree basal area (all species confounded) in the NFI plots and the mortality rates.
As expected, the basal area is positively associated with mortality rates.
For the four methods (GDM, LFMM, GF and RDA), populations with higher genomic offset based on the candidate SNPs (and all SNPs for LFMM) had higher mortality rates in the NFI plots. Interestingly, this was not the case with genomic offset predictions based on the control SNPs: for GF and LFMM, populations with higher genomic offset based on the control SNPs also shoed higher mortality rates but for GDM and RDA, populations with higher genomic offset showed lower mortality rates in the NFI plots. This confirms the importance of selecting a set of alleles potentially under selection before using the genomic offset predictions to predict maladaptation in natural populations (or using all SNPs with the LFMM approach).
We can also note the very high association between mortality rates and the genomic offset predictions based on candidate SNPs and from the GF and GDM methods.
GO predictions in Galicia
GO predictions in Galicia (especially those obtained with GF) show a strong boundary between two zones (see Figure 4b of the manuscript). How can we explain this pattern?
In Galicia, GO predictions from the GF approach (see Figure 4b of the manuscript) are higher for plots surveyed for the second time in 1998 than plots surveyed for the second time in 1997.
We explore the climatic differences between the two sets of plots, ie plots surveyed in 1997 vs 1998. First, we extract the annual climatic values at the location of the Galician plots sampled in 1997 and 1998 and plot their values across the study period (ie between the first and second surveys).
Code
# Extracting annual climatic values at the location of the NFI plots in Galicia surveyed for the second time inm 1997 or 1998subclim <- clim %>%left_join(nfi_data, by=c("plotcode","longitude","latitude")) %>% dplyr::filter(second_survey %in%1997:1998) %>% dplyr::filter(year %in%min(first_survey):1998) %>% dplyr::select(second_survey,year,all_of(clim_var))# violin_plots <- lapply(clim_var, function(x) {# # var_name <- extract_climatedt_metadata(var_clim = x) %>% # mutate(var_legend= paste0(description, " (", label,"; ",unit_symbol,")")) %>% # pull(var_legend)# # subclim %>% # mutate(year=as.factor(year)) %>% # ggplot(aes_string(y=x,x="year")) + # geom_jitter(shape=16,aes(colour=factor(second_survey)),position=position_jitter(0.2),size=2,alpha=0.3) +# xlab("") + # scale_color_manual(name="Year of the second survey",# values = c("yellow2","darkorchid")) +# ggtitle(var_name) +# ylab("mean climate under survey period - mean past climate") +# theme_bw()# # })violin_plots <-lapply(clim_var, function(x) {var_name <-extract_climatedt_metadata(var_clim = x) %>%mutate(var_legend=paste0(description, " (", label,"; ",unit_symbol,")")) %>%pull(var_legend)subclim %>%mutate(year=as.factor(year),second_survey=as.factor(second_survey)) %>%ggplot(aes_string(y=x,x="year",color="second_survey")) +geom_point(shape=16,position=position_jitter(0.2),size=2,alpha=0.1) +geom_violin(alpha=0.3,fill="white", linewidth=0.7) +xlab("") +ylab("") +ggtitle(var_name) +scale_color_manual(name="Year of the second survey",values =c("yellow2","darkorchid")) +theme_bw() +guides(colour =guide_legend(override.aes =list(size=2)))})
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation ideoms with `aes()`
Code
violin_plots
From these plots, it seems that the plots sampled in 1997 have lower isothermality (bio3) and lower temperature seasonality (bio4).
We also compare the mean climatic values of the reference and survey periods between the plots surveyed in 1997 and 1998. For that, we calculate for each plot (and each climatic variable) the difference between its mean climate across the reference period (ie 1901-1950) and its mean climate across the survey period (between the first and second survey period).
Code
# We load the datasets with mean climatic values across the reference and survey periods# Then we calculate the difference between the mean climates of the two periods for each plot# We then keep only plots surveyed (second survey) in 1997 or 1998 (Galician plots)subnficlim <-readRDS(here("data/ClimaticData/NFIplots/NFIclimate.rds")) %>%bind_rows() %>%group_by(plotcode,longitude,latitude,elevation) %>%summarise(across(tmn01:Eref, diff)) %>%ungroup() %>%left_join(nfi_data, by=c("plotcode","longitude","latitude")) %>% dplyr::filter(second_survey %in%1997:1998)bg_color <-"gray44"# background colorviolin_plots <-lapply(clim_var, function(x) {var_name <-extract_climatedt_metadata(var_clim = x) %>%mutate(var_legend=paste0(description, " (", label,"; ",unit_symbol,")")) %>%pull(var_legend)subnficlim %>%mutate(second_survey=as.factor(second_survey)) %>%ggplot(aes_string(y=x,x="second_survey")) +geom_violin(alpha=0.2, fill="gray76", color="darkgreen") +geom_jitter(shape=16,position=position_jitter(0.2),size=2,alpha=0.4) +xlab("") +ggtitle(var_name) +ylab("mean climate under survey period - mean past climate") +theme_bw()})violin_plots
These plots show that plots surveyed in 1998 show higher increase in the mean annual temperature (bio1), lower increase in annual precipitation (bio12) and stronger decrease in temperature seasonality (bio4) than plots surveyed in 1997. As temperature seasonality is the most important variable to explain the allelic turnover in GF, it may explain the strong differences in GO predictions among plots surveyed in 1997 and 1998.
Changenet, Alexandre, Paloma Ruiz-Benito, Sophia Ratcliffe, Thibaut Fréjaville, Juliette Archambeau, Annabel J Porte, Miguel A Zavala, Jonas Dahlgren, Aleksi Lehtonen, and Marta Benito Garzón. 2021. “Occurrence but Not Intensity of Mortality Rises Towards the Climatic Trailing Edge of Tree Species Ranges in European Forests.”Global Ecology and Biogeography.